Skip to main content

Reply to this post | Go Back
View Post [edit]

Poster: dingua Date: Jan 23, 2016 9:43pm
Forum: texts Subject: How to get the all list of subjects of one search

To search we are using advancedSearch API.
For each item it gives the subjects related to that item.
But I can see when using the archive.org website that in the right side of search page you can see all subjects,languages & categories related to what you are searching , so you can filter based on these subjects,languages and categories.
Can anyone give me idea how can i get all this information using JSON API.
Thanks.

Reply to this post
Reply [edit]

Poster: stbalbach Date: Jan 24, 2016 6:33am
Forum: texts Subject: Re: How to get the all list of subjects of one search

On the advanced search page:

https://archive.org/advancedsearch.php

On the bottom half, on the left side is a column with a list of fields to return. Multiple fields can be specified by using the cntrl-click method to highlight.

eg this will return description, language and subject, 50 results in JSON

https://archive.org/advancedsearch.php?q=test&;fl%5B%5D=description&fl%5B%5D=language&fl%5B%5D=subject&sort%5B%5D=&sort%5B%5D=&sort%5B%5D=&rows=50&page=1&output=json&callback=callback&save=yes

This post was modified by stbalbach on 2016-01-24 14:33:42

Reply to this post
Reply [edit]

Poster: dingua Date: Jan 24, 2016 9:51pm
Forum: texts Subject: Re: How to get the all list of subjects of one search

thanks for the reply,
one more question: if i need to take all subjects/languages for all searched items..do i need to set high number to "rows" parameter, like this:

https://archive.org/advancedsearch.php?q=test&;;fl%5B%5D=description&fl%5B%5D=language&fl%5B%5D=subject&sort%5B%5D=&sort%5B%5D=&sort%5B%5D=&rows=999999&page=1&output=json&callback=callback&save=yes

And i still believe that for archive website they are not doing this to get all topics as this takes too much time to load.

This post was modified by dingua on 2016-01-25 05:51:36

Reply to this post
Reply [edit]

Poster: stbalbach Date: Jan 25, 2016 6:01pm
Forum: texts Subject: Re: How to get the all list of subjects of one search

Hi,

Rows is the number of results per page. The &page parameter specifies which page to return. To get all results, find total number of results (the numFound field in the JSON header data), divide by 50 (or whatever &rows is) and reiterate through the pages until done. If there are 2000 results, there are 400 pages of 50 results each:

&rows=50&page=1
&rows=50&page=2
...
&rows=50&page=400

A rows of 50 or 100 is probably a friendly number not to load the server. Also means you can examine the input and quit early if the quality of search results gets worse towards the end.

Stephen

Reply to this post
Reply [edit]

Poster: dingua Date: Jan 25, 2016 8:24pm
Forum: texts Subject: Re: How to get the all list of subjects of one search

Hi,
I think i did not express myself well enough.
What i need to get exactly is this list :
https://copy.com/LmGV4cxeYZSqduSu

The website shows this list from the first loaded page.
For example for American Library the list present all topics in the collection (not only on the pages loaded). Which means that they have another request independently of the one which loads pages that gives the list of topics (subjects), languages and collections.

For now i have in mind only one solution (based on your first answer) which is consists on loading all items of one search with selecting only the fields i need (language, subject, collection) and in the client side i parse the json counting all subjects,languages and collections present in the response. But this will take too much time!
So my next question, is there any simple solution which gives the set of subjects of one search, like this web service which gives number of item types archived:
https://archive.org/index.php?output=json&;callback=IAE.topfn

Thanks!

Reply to this post
Reply [edit]

Poster: stbalbach Date: Jan 25, 2016 9:21pm
Forum: texts Subject: Re: How to get the all list of subjects of one search

It sounds like you want the number of works? Here is the JSON of a search on "William". It has:

"numFound":285934

There are 285934 works containing "William".

Reply to this post
Reply [edit]

Poster: dingua Date: Jan 25, 2016 9:42pm
Forum: texts Subject: Re: How to get the all list of subjects of one search

No, i need the set of subjects of one search .

Reply to this post
Reply [edit]

Poster: stbalbach Date: Jan 26, 2016 5:02pm
Forum: texts Subject: Re: How to get the all list of subjects of one search

See post here. Can't get them all in "one search", have to walk through it one piece at a time using a script.

1. Download the search with Rows = 50
2. Get the NumFound value
3. Divide NumFound / Rows = pages
4. For (i = 0; while i < pages; i++)
get page i

Where "get page i" is the advanced search URL with the requested field set to "subject", rows=50 and page=i



This post was modified by stbalbach on 2016-01-27 01:02:24

Reply to this post
Reply [edit]

Poster: PDpolice Date: Jan 24, 2016 7:33am
Forum: texts Subject: How to get the all list of the Forum posts as they were.

Thanks for the search help.
Here is how to see all the forum posts.

https://archive.org/iathreads/posts-display-new.php?limit=100

Reply to this post
Reply [edit]

Poster: stbalbach Date: Jan 24, 2016 7:49am
Forum: texts Subject: Re: How to get the all list of the Forum posts as they were.

Good to know. This makes monitoring other forums easier without having to click through to each one. I see Grateful Dead is 50% of the traffic :)

Stephen